{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Obfuscation Library\n",
    "\n",
    "Sharing data, creating documents and doing public demonstrations often require that data containing\n",
    "PII or other sensitive material be obfuscated.\n",
    "\n",
    "MSTICPy contains a simple library to obfuscate data using hashing and random mapping of values.\n",
    "You can use these functions on a single data items or entire DataFrames.\n",
    "\n",
    "## Contents\n",
    "- [Import the module](#Import-the-module)\n",
    "- [Individual Obfuscation Functions](#Individual-Obfuscation-Functions)\n",
    "- [Obfuscating DataFrames](#Obfuscating-DataFrames)\n",
    "- [Creating custom column mappings](#Creating-custom-mappings)\n",
    "- [Using hash_item with delimiters](#Using-hash_item-with-delimiters-to-preserve-the-structure/look-of-the-hashed-input)\n",
    "- [Checking Your Obfuscation](#Checking-Your-Obfuscation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import the module"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from msticpy.common.utility import md\n",
    "from msticpy.data import data_obfus"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read in some data for the examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "netflow_df = pd.read_csv(\"data/az_net_flows.csv\")\n",
    "# list is imported as string from csv - convert back to list with eval\n",
    "def str_to_list(val):\n",
    "    if isinstance(val, str):\n",
    "        return eval(val)\n",
    "netflow_df[\"PublicIPs\"] = netflow_df[\"PublicIPs\"].apply(str_to_list)\n",
    "\n",
    "# Define subset of output columns\n",
    "out_cols = [\n",
    "    'TenantId', 'TimeGenerated', 'FlowStartTime',\n",
    "    'ResourceGroup', 'VMName', 'VMIPAddress', 'PublicIPs',\n",
    "    'SrcIP', 'DestIP', 'L4Protocol', 'AllExtIPs'\n",
    "]\n",
    "netflow_df = netflow_df[out_cols]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Individual Obfuscation Functions\n",
    "\n",
    "Here we're importing individual functions but you can access them with the single\n",
    "import statement above as:\n",
    "```\n",
    "data_obfus.hash_string(...)\n",
    "```\n",
    "etc.\n",
    "\n",
    "> **Note** In the next cell we're using a function to output documentation and examples.<br>\n",
    "> You can ignore this. The usage of each function is show in the output of<br>\n",
    "> the subsequent cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from msticpy.data.data_obfus import (\n",
    "    hash_dict,\n",
    "    hash_ip,\n",
    "    hash_item,\n",
    "    hash_list,\n",
    "    hash_sid,\n",
    "    hash_string,\n",
    "    replace_guid\n",
    ")\n",
    "\n",
    "# Function to automate/format the examples below. You can ignore this\n",
    "def show_func(func, examples):\n",
    "    func_name = func.__name__\n",
    "    if func.__name__.startswith(\"_\"):\n",
    "        func_name = func_name[1:]\n",
    "    md(func_name, \"bold\")\n",
    "    print(func.__doc__)\n",
    "    md(\"Examples\", \"bold\")\n",
    "    for example in examples:\n",
    "        if isinstance(example, tuple):\n",
    "            arg, delim = example\n",
    "            print(\n",
    "                f\"{func_name}('{arg}', delim='{delim}') =>\", func(*example)\n",
    "            )\n",
    "        else:\n",
    "            print(\n",
    "                f\"{func_name}('{example}') =>\", func(example)\n",
    "            )\n",
    "    md(\"<br><hr><br>\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style='font-size: 130%;font-weight: bold'>hash_string</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>hash_string does a simple hash of the input. If the input is a numeric string it will output a numeric</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>hash_string</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    Hash a simple string.\n",
      "\n",
      "    Parameters\n",
      "    ----------\n",
      "    input_str : str\n",
      "        The input string\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    str\n",
      "        The obfuscated output string\n",
      "\n",
      "    \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>Examples</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hash_string('sensitive data') => jdiqcnrqmlidkd\n",
      "hash_string('42424') => 98478\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style=''><br><hr><br></p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "md(\"hash_string\", \"large, bold\")\n",
    "md(\"hash_string does a simple hash of the input. If the input is a numeric string it will output a numeric\")\n",
    "show_func(hash_string, [\"sensitive data\", \"42424\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style='font-size: 130%;font-weight: bold'>hash_item</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>hash_item allows specification of delimiters. Useful for preserving the look of domains, emails, etc.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>hash_item</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    Hash a simple string.\n",
      "\n",
      "    Parameters\n",
      "    ----------\n",
      "    input_item : str\n",
      "        The input string\n",
      "    delim: str, optional\n",
      "        A string of delimiters to use to split the input string\n",
      "        prior to hashing.\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    str\n",
      "        The obfuscated output string\n",
      "\n",
      "    \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>Examples</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hash_item('sensitive data', delim=' ') => kdneqoiia laoe\n",
      "hash_item('most-sensitive-data/here', delim=' /-') => kmea-kdneqoiia-laoe/fcec\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style=''><br><hr><br></p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "md(\"hash_item\", \"large, bold\")\n",
    "md(\"hash_item allows specification of delimiters. Useful for preserving the look of domains, emails, etc.\")\n",
    "show_func(hash_item, [(\"sensitive data\", \" \"), (\"most-sensitive-data/here\", \" /-\")])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style='font-size: 130%;font-weight: bold'>hash_ip</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>hash_ip will output random mappings of input IP V4 and V6 addresses.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>Within a Python session the mapping will remain constant.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>hash_ip</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    Hash IP address or list of IP addresses.\n",
      "\n",
      "    Parameters\n",
      "    ----------\n",
      "    input_item : Union[List[str], str]\n",
      "        List of IP addresses or single IP address.\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    Union[List[str], str]\n",
      "        List of hashed addresses or single address.\n",
      "        (depending on input)\n",
      "\n",
      "    \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>Examples</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hash_ip('192.168.3.1') => 192.168.84.105\n",
      "hash_ip('2001:0db8:85a3:0000:0000:8a2e:0370:7334') => 85d6:7819:9cce:9af1:9af1:24ad:d338:7d03\n",
      "hash_ip('['192.168.3.1', '192.168.5.2', '192.168.10.2']') => ['192.168.84.105', '192.168.172.202', '192.168.232.202']\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style=''><br><hr><br></p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "md(\"hash_ip\", \"large, bold\")\n",
    "md(\"hash_ip will output random mappings of input IP V4 and V6 addresses.\")\n",
    "md(\"Within a Python session the mapping will remain constant.\")\n",
    "show_func(hash_ip, [\n",
    "    \"192.168.3.1\", \n",
    "    \"2001:0db8:85a3:0000:0000:8a2e:0370:7334\",\n",
    "    [\"192.168.3.1\", \"192.168.5.2\", \"192.168.10.2\"],\n",
    "])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style='font-size: 130%;font-weight: bold'>hash_sid</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>hash_sid will randomize the domain-specific parts of a SID. It preserves built-in SIDs and well known RIDs (e.g. Admins -500)</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>hash_sid</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    Hash a SID preserving well-known SIDs and the RID.\n",
      "\n",
      "    Parameters\n",
      "    ----------\n",
      "    sid : str\n",
      "        SID string\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    str\n",
      "        Hashed SID\n",
      "\n",
      "    \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>Examples</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hash_sid('S-1-5-21-1180699209-877415012-3182924384-1004') => S-1-5-21-3321821741-636458740-4143214142-1004\n",
      "hash_sid('S-1-5-18') => S-1-5-18\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style=''><br><hr><br></p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "md(\"hash_sid\", \"large, bold\")\n",
    "md(\"hash_sid will randomize the domain-specific parts of a SID. It preserves built-in SIDs and well known RIDs (e.g. Admins -500)\")\n",
    "show_func(hash_sid, [\"S-1-5-21-1180699209-877415012-3182924384-1004\", \"S-1-5-18\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style='font-size: 130%;font-weight: bold'>hash_list</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>hash_list will randomize a list of items preserving the list structure.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>hash_list</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    Hash list of strings.\n",
      "\n",
      "    Parameters\n",
      "    ----------\n",
      "    item_list : List[str]\n",
      "        Input list\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    List[str]\n",
      "        Hashed list\n",
      "\n",
      "    \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>Examples</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hash_list('['S-1-5-21-1180699209-877415012-3182924384-1004', 'S-1-5-18']') => ['elkbjiboklpknokdeflikamojqjflqmicqiorqfbqboqe', 'nrllmpbd']\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style=''><br><hr><br></p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "md(\"hash_list\", \"large, bold\")\n",
    "md(\"hash_list will randomize a list of items preserving the list structure.\")\n",
    "show_func(hash_list, [[\"S-1-5-21-1180699209-877415012-3182924384-1004\", \"S-1-5-18\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style='font-size: 130%;font-weight: bold'>hash_dict</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>hash_dict will randomize a dict of items preserving the structure and the dict keys.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>hash_dict</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "    Hash dictionary values.\n",
      "\n",
      "    Parameters\n",
      "    ----------\n",
      "    item_dict : Dict[str, Union[Dict[str, Any], List[Any], str]]\n",
      "        Input item can be a Dict of strings, lists or other\n",
      "        dictionaries.\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    Dict[str, Any]\n",
      "        Dictionary with hashed values.\n",
      "\n",
      "    \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>Examples</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hash_dict('{'SID1': 'S-1-5-21-1180699209-877415012-3182924384-1004', 'SID2': 'S-1-5-18'}') => {'SID1': 'elkbjiboklpknokdeflikamojqjflqmicqiorqfbqboqe', 'SID2': 'nrllmpbd'}\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style=''><br><hr><br></p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "md(\"hash_dict\", \"large, bold\")\n",
    "md(\"hash_dict will randomize a dict of items preserving the structure and the dict keys.\")\n",
    "show_func(hash_dict, [{\"SID1\": \"S-1-5-21-1180699209-877415012-3182924384-1004\", \"SID2\": \"S-1-5-18\"}])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<p style='font-size: 130%;font-weight: bold'>replace_guid</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>replace_guid will output a random UUID mapped to the input.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>An input GUID will be mapped to the same newly-generated output UUID</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style=''>You can see that UUID #4 is the same as #1 and mapped to the same output UUID.</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>replace_guid</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "        Replace GUID/UUID with mapped random UUID.\n",
      "\n",
      "        Parameters\n",
      "        ----------\n",
      "        guid : str\n",
      "            Input UUID.\n",
      "\n",
      "        Returns\n",
      "        -------\n",
      "        str\n",
      "            Mapped UUID\n",
      "\n",
      "        \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style='font-weight: bold'>Examples</p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "replace_guid('cf1b0b29-08ae-4528-839a-5f66eca2cce9') => 01ae8633-22e5-480f-b884-fc48588c25d9\n",
      "replace_guid('ed63d29e-6288-4d66-b10d-8847096fc586') => 52cd2814-b5e4-48bd-80f2-51b503e50467\n",
      "replace_guid('ac561203-99b2-4067-a525-60d45ea0d7ff') => ef059dc7-2d6e-4506-8619-05b346a6bc6b\n",
      "replace_guid('cf1b0b29-08ae-4528-839a-5f66eca2cce9') => 01ae8633-22e5-480f-b884-fc48588c25d9\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<p style=''><br><hr><br></p>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "md(\"replace_guid\", \"large, bold\")\n",
    "md(\"replace_guid will output a random UUID mapped to the input.\")\n",
    "md(\"An input GUID will be mapped to the same newly-generated output UUID\")\n",
    "md(\"You can see that UUID #4 is the same as #1 and mapped to the same output UUID.\")\n",
    "show_func(replace_guid, [\n",
    "    \"cf1b0b29-08ae-4528-839a-5f66eca2cce9\",\n",
    "    \"ed63d29e-6288-4d66-b10d-8847096fc586\",\n",
    "    \"ac561203-99b2-4067-a525-60d45ea0d7ff\",\n",
    "    \"cf1b0b29-08ae-4528-839a-5f66eca2cce9\",\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Obfuscating DataFrames\n",
    "\n",
    "We can use the msticpy pandas extension to obfuscate an entire DataFrame.\n",
    "\n",
    "The obfuscation library contains a mapping for a number of common field names.\n",
    "You can view this list by displaying the attribute:\n",
    "```\n",
    "data_obfus.OBFUS_COL_MAP\n",
    "```\n",
    "\n",
    "In the first example, the TenantId, ResourceGroup, VMName have been obfuscated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TenantId</th>\n",
       "      <th>TimeGenerated</th>\n",
       "      <th>FlowStartTime</th>\n",
       "      <th>ResourceGroup</th>\n",
       "      <th>VMName</th>\n",
       "      <th>VMIPAddress</th>\n",
       "      <th>PublicIPs</th>\n",
       "      <th>SrcIP</th>\n",
       "      <th>DestIP</th>\n",
       "      <th>L4Protocol</th>\n",
       "      <th>AllExtIPs</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>52b1ab41-869e-4138-9e40-2a4457f09bf0</td>\n",
       "      <td>2019-02-12 14:22:40.697</td>\n",
       "      <td>2019-02-12 13:00:07.000</td>\n",
       "      <td>asihuntomsworkspacerg</td>\n",
       "      <td>msticalertswin1</td>\n",
       "      <td>10.0.3.5</td>\n",
       "      <td>[65.55.44.109]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>65.55.44.109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>52b1ab41-869e-4138-9e40-2a4457f09bf0</td>\n",
       "      <td>2019-02-12 14:22:40.681</td>\n",
       "      <td>2019-02-12 13:00:48.000</td>\n",
       "      <td>asihuntomsworkspacerg</td>\n",
       "      <td>msticalertswin1</td>\n",
       "      <td>10.0.3.5</td>\n",
       "      <td>[13.71.172.130, 13.71.172.128]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>13.71.172.128</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>52b1ab41-869e-4138-9e40-2a4457f09bf0</td>\n",
       "      <td>2019-02-12 14:22:40.681</td>\n",
       "      <td>2019-02-12 13:00:48.000</td>\n",
       "      <td>asihuntomsworkspacerg</td>\n",
       "      <td>msticalertswin1</td>\n",
       "      <td>10.0.3.5</td>\n",
       "      <td>[13.71.172.130, 13.71.172.128]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>13.71.172.130</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               TenantId            TimeGenerated  \\\n",
       "0  52b1ab41-869e-4138-9e40-2a4457f09bf0  2019-02-12 14:22:40.697   \n",
       "1  52b1ab41-869e-4138-9e40-2a4457f09bf0  2019-02-12 14:22:40.681   \n",
       "2  52b1ab41-869e-4138-9e40-2a4457f09bf0  2019-02-12 14:22:40.681   \n",
       "\n",
       "             FlowStartTime          ResourceGroup           VMName  \\\n",
       "0  2019-02-12 13:00:07.000  asihuntomsworkspacerg  msticalertswin1   \n",
       "1  2019-02-12 13:00:48.000  asihuntomsworkspacerg  msticalertswin1   \n",
       "2  2019-02-12 13:00:48.000  asihuntomsworkspacerg  msticalertswin1   \n",
       "\n",
       "  VMIPAddress                       PublicIPs SrcIP DestIP L4Protocol  \\\n",
       "0    10.0.3.5                  [65.55.44.109]   NaN    NaN          T   \n",
       "1    10.0.3.5  [13.71.172.130, 13.71.172.128]   NaN    NaN          T   \n",
       "2    10.0.3.5  [13.71.172.130, 13.71.172.128]   NaN    NaN          T   \n",
       "\n",
       "       AllExtIPs  \n",
       "0   65.55.44.109  \n",
       "1  13.71.172.128  \n",
       "2  13.71.172.130  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TenantId</th>\n",
       "      <th>TimeGenerated</th>\n",
       "      <th>FlowStartTime</th>\n",
       "      <th>ResourceGroup</th>\n",
       "      <th>VMName</th>\n",
       "      <th>VMIPAddress</th>\n",
       "      <th>PublicIPs</th>\n",
       "      <th>SrcIP</th>\n",
       "      <th>DestIP</th>\n",
       "      <th>L4Protocol</th>\n",
       "      <th>AllExtIPs</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>f9ef3428-3ccb-4ecd-8466-dbedc7044293</td>\n",
       "      <td>2019-02-12 14:22:40.697</td>\n",
       "      <td>2019-02-12 13:00:07.000</td>\n",
       "      <td>ibmkajbmepnmiaeilfofa</td>\n",
       "      <td>fmlmbnlpdcbnbnn</td>\n",
       "      <td>10.0.3.5</td>\n",
       "      <td>[65.55.44.109]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>65.55.44.109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>f9ef3428-3ccb-4ecd-8466-dbedc7044293</td>\n",
       "      <td>2019-02-12 14:22:40.681</td>\n",
       "      <td>2019-02-12 13:00:48.000</td>\n",
       "      <td>ibmkajbmepnmiaeilfofa</td>\n",
       "      <td>fmlmbnlpdcbnbnn</td>\n",
       "      <td>10.0.3.5</td>\n",
       "      <td>[13.71.172.130, 13.71.172.128]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>13.71.172.128</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>f9ef3428-3ccb-4ecd-8466-dbedc7044293</td>\n",
       "      <td>2019-02-12 14:22:40.681</td>\n",
       "      <td>2019-02-12 13:00:48.000</td>\n",
       "      <td>ibmkajbmepnmiaeilfofa</td>\n",
       "      <td>fmlmbnlpdcbnbnn</td>\n",
       "      <td>10.0.3.5</td>\n",
       "      <td>[13.71.172.130, 13.71.172.128]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>13.71.172.130</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               TenantId            TimeGenerated  \\\n",
       "0  f9ef3428-3ccb-4ecd-8466-dbedc7044293  2019-02-12 14:22:40.697   \n",
       "1  f9ef3428-3ccb-4ecd-8466-dbedc7044293  2019-02-12 14:22:40.681   \n",
       "2  f9ef3428-3ccb-4ecd-8466-dbedc7044293  2019-02-12 14:22:40.681   \n",
       "\n",
       "             FlowStartTime          ResourceGroup           VMName  \\\n",
       "0  2019-02-12 13:00:07.000  ibmkajbmepnmiaeilfofa  fmlmbnlpdcbnbnn   \n",
       "1  2019-02-12 13:00:48.000  ibmkajbmepnmiaeilfofa  fmlmbnlpdcbnbnn   \n",
       "2  2019-02-12 13:00:48.000  ibmkajbmepnmiaeilfofa  fmlmbnlpdcbnbnn   \n",
       "\n",
       "  VMIPAddress                       PublicIPs SrcIP DestIP L4Protocol  \\\n",
       "0    10.0.3.5                  [65.55.44.109]   NaN    NaN          T   \n",
       "1    10.0.3.5  [13.71.172.130, 13.71.172.128]   NaN    NaN          T   \n",
       "2    10.0.3.5  [13.71.172.130, 13.71.172.128]   NaN    NaN          T   \n",
       "\n",
       "       AllExtIPs  \n",
       "0   65.55.44.109  \n",
       "1  13.71.172.128  \n",
       "2  13.71.172.130  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "display(netflow_df.head(3))\n",
    "netflow_df.head(3).mp_mask.mask()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding custom column mappings\n",
    "\n",
    "Note in the previous example that the VMIPAddress, PublicIPs and AllExtIPs columns were unchanged.\n",
    "\n",
    "We can add these columns to a custom mapping dictionary and re-run the obfuscation.\n",
    "See the later section on [Creating Custom Mappings](#Creating-custom-mappings)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TenantId</th>\n",
       "      <th>TimeGenerated</th>\n",
       "      <th>FlowStartTime</th>\n",
       "      <th>ResourceGroup</th>\n",
       "      <th>VMName</th>\n",
       "      <th>VMIPAddress</th>\n",
       "      <th>PublicIPs</th>\n",
       "      <th>SrcIP</th>\n",
       "      <th>DestIP</th>\n",
       "      <th>L4Protocol</th>\n",
       "      <th>AllExtIPs</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>f9ef3428-3ccb-4ecd-8466-dbedc7044293</td>\n",
       "      <td>2019-02-12 14:22:40.697</td>\n",
       "      <td>2019-02-12 13:00:07.000</td>\n",
       "      <td>ibmkajbmepnmiaeilfofa</td>\n",
       "      <td>fmlmbnlpdcbnbnn</td>\n",
       "      <td>10.0.3.5</td>\n",
       "      <td>[65.55.44.109]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>65.55.44.109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>f9ef3428-3ccb-4ecd-8466-dbedc7044293</td>\n",
       "      <td>2019-02-12 14:22:40.681</td>\n",
       "      <td>2019-02-12 13:00:48.000</td>\n",
       "      <td>ibmkajbmepnmiaeilfofa</td>\n",
       "      <td>fmlmbnlpdcbnbnn</td>\n",
       "      <td>10.0.3.5</td>\n",
       "      <td>[13.71.172.130, 13.71.172.128]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>13.71.172.128</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>f9ef3428-3ccb-4ecd-8466-dbedc7044293</td>\n",
       "      <td>2019-02-12 14:22:40.681</td>\n",
       "      <td>2019-02-12 13:00:48.000</td>\n",
       "      <td>ibmkajbmepnmiaeilfofa</td>\n",
       "      <td>fmlmbnlpdcbnbnn</td>\n",
       "      <td>10.0.3.5</td>\n",
       "      <td>[13.71.172.130, 13.71.172.128]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>13.71.172.130</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               TenantId            TimeGenerated  \\\n",
       "0  f9ef3428-3ccb-4ecd-8466-dbedc7044293  2019-02-12 14:22:40.697   \n",
       "1  f9ef3428-3ccb-4ecd-8466-dbedc7044293  2019-02-12 14:22:40.681   \n",
       "2  f9ef3428-3ccb-4ecd-8466-dbedc7044293  2019-02-12 14:22:40.681   \n",
       "\n",
       "             FlowStartTime          ResourceGroup           VMName  \\\n",
       "0  2019-02-12 13:00:07.000  ibmkajbmepnmiaeilfofa  fmlmbnlpdcbnbnn   \n",
       "1  2019-02-12 13:00:48.000  ibmkajbmepnmiaeilfofa  fmlmbnlpdcbnbnn   \n",
       "2  2019-02-12 13:00:48.000  ibmkajbmepnmiaeilfofa  fmlmbnlpdcbnbnn   \n",
       "\n",
       "  VMIPAddress                       PublicIPs SrcIP DestIP L4Protocol  \\\n",
       "0    10.0.3.5                  [65.55.44.109]   NaN    NaN          T   \n",
       "1    10.0.3.5  [13.71.172.130, 13.71.172.128]   NaN    NaN          T   \n",
       "2    10.0.3.5  [13.71.172.130, 13.71.172.128]   NaN    NaN          T   \n",
       "\n",
       "       AllExtIPs  \n",
       "0   65.55.44.109  \n",
       "1  13.71.172.128  \n",
       "2  13.71.172.130  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col_map = {\n",
    "    \"VMName\": \".\",\n",
    "    \"VMIPAddress\": \"ip\", \n",
    "    \"PublicIPs\": \"ip\",\n",
    "    \"AllExtIPs\": \"ip\"\n",
    "}\n",
    "\n",
    "netflow_df.head(3).mp_mask.mask()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ofuscate_df function\n",
    "\n",
    "You can also call the standard function `obfuscate_df` to perform the same operation\n",
    "on the dataframe passed as the `data` parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TenantId</th>\n",
       "      <th>TimeGenerated</th>\n",
       "      <th>FlowStartTime</th>\n",
       "      <th>ResourceGroup</th>\n",
       "      <th>VMName</th>\n",
       "      <th>VMIPAddress</th>\n",
       "      <th>PublicIPs</th>\n",
       "      <th>SrcIP</th>\n",
       "      <th>DestIP</th>\n",
       "      <th>L4Protocol</th>\n",
       "      <th>AllExtIPs</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>f9ef3428-3ccb-4ecd-8466-dbedc7044293</td>\n",
       "      <td>2019-02-12 14:22:40.697</td>\n",
       "      <td>2019-02-12 13:00:07.000</td>\n",
       "      <td>ibmkajbmepnmiaeilfofa</td>\n",
       "      <td>fmlmbnlpdcbnbnn</td>\n",
       "      <td>10.112.51.93</td>\n",
       "      <td>[100.11.187.82]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>100.11.187.82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>f9ef3428-3ccb-4ecd-8466-dbedc7044293</td>\n",
       "      <td>2019-02-12 14:22:40.681</td>\n",
       "      <td>2019-02-12 13:00:48.000</td>\n",
       "      <td>ibmkajbmepnmiaeilfofa</td>\n",
       "      <td>fmlmbnlpdcbnbnn</td>\n",
       "      <td>10.112.51.93</td>\n",
       "      <td>[144.169.193.140, 144.169.193.144]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>144.169.193.144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>f9ef3428-3ccb-4ecd-8466-dbedc7044293</td>\n",
       "      <td>2019-02-12 14:22:40.681</td>\n",
       "      <td>2019-02-12 13:00:48.000</td>\n",
       "      <td>ibmkajbmepnmiaeilfofa</td>\n",
       "      <td>fmlmbnlpdcbnbnn</td>\n",
       "      <td>10.112.51.93</td>\n",
       "      <td>[144.169.193.140, 144.169.193.144]</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>T</td>\n",
       "      <td>144.169.193.140</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               TenantId            TimeGenerated  \\\n",
       "0  f9ef3428-3ccb-4ecd-8466-dbedc7044293  2019-02-12 14:22:40.697   \n",
       "1  f9ef3428-3ccb-4ecd-8466-dbedc7044293  2019-02-12 14:22:40.681   \n",
       "2  f9ef3428-3ccb-4ecd-8466-dbedc7044293  2019-02-12 14:22:40.681   \n",
       "\n",
       "             FlowStartTime          ResourceGroup           VMName  \\\n",
       "0  2019-02-12 13:00:07.000  ibmkajbmepnmiaeilfofa  fmlmbnlpdcbnbnn   \n",
       "1  2019-02-12 13:00:48.000  ibmkajbmepnmiaeilfofa  fmlmbnlpdcbnbnn   \n",
       "2  2019-02-12 13:00:48.000  ibmkajbmepnmiaeilfofa  fmlmbnlpdcbnbnn   \n",
       "\n",
       "    VMIPAddress                           PublicIPs SrcIP DestIP L4Protocol  \\\n",
       "0  10.112.51.93                     [100.11.187.82]   NaN    NaN          T   \n",
       "1  10.112.51.93  [144.169.193.140, 144.169.193.144]   NaN    NaN          T   \n",
       "2  10.112.51.93  [144.169.193.140, 144.169.193.144]   NaN    NaN          T   \n",
       "\n",
       "         AllExtIPs  \n",
       "0    100.11.187.82  \n",
       "1  144.169.193.144  \n",
       "2  144.169.193.140  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_obfus.obfuscate_df(data=netflow_df.head(3), column_map=col_map)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating custom mappings\n",
    "\n",
    "A custom mapping dictionary has entries in the following form:\n",
    "```\n",
    "    \"ColumnName\": \"operation\"\n",
    "```\n",
    "\n",
    "The `operation` defines the type of obfuscation method used for that column. Both the column\n",
    "and the operation code must be quoted.\n",
    "\n",
    "|operation code | obfuscation function |\n",
    "|---------------|----------------------|\n",
    "| \"uuid\"        | replace_guid         |\n",
    "| \"ip\"          | hash_ip              |\n",
    "| \"str\"         | hash_string          |\n",
    "| \"dict\"        | hash_dict            |\n",
    "| \"list\"        | hash_list            |\n",
    "| \"sid\"         | hash_sid             |\n",
    "| \"null\"        | \"null\"\\*             |\n",
    "| None          | hash_str\\*           |\n",
    "| delims_str    | hash_item\\*          |\n",
    "\n",
    "\\*The last three items require some explanation:\n",
    "- null - the `null` operation code means set the value to empty - i.e. delete the value\n",
    "  in the output frame.\n",
    "- None (i.e. the dictionary value is `None`) default to hash_string.\n",
    "- delims_str - any string other than those named above is assumed to be a string of delimiters.\n",
    "  See next section for a discussion of use of delimiters.\n",
    "\n",
    "---\n",
    "\n",
    "> **NOTE** If you want to *only* use custom mappings and ignore the builtin<br>\n",
    "> mapping table, specify `use_default=False` as a parameter to either<br>\n",
    "> `mp_obf.obfuscate()` or `obfuscate_df`\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using `hash_item` with delimiters to preserve the structure/look of the hashed input\n",
    "\n",
    "Using hash_item with a delimiters string lets you create output that somewhat resembles the input\n",
    "type. The delimiters string is specified as a simple string of delimiter characters, e.g. `\"@\\,-\"`\n",
    "\n",
    "The input string is broken into substrings using each of the delimiters in the delims_str. The substrings\n",
    "are individually hashed and the resulting substrings joined together using the original delimiters.\n",
    "The string is split in the order of the characters in the delims string.\n",
    "\n",
    "This allows you to create hashed values that bear some resemblance to the original structure of the string.\n",
    "This might be useful for email address, qualified domain names and other structure text.\n",
    "\n",
    "For example :\n",
    "    ian@mydomain.com\n",
    "    \n",
    "Using the simple `hash_string` function the output bears no resemblance to an email address"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'prqocjmdpbodrafn'"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hash_string(\"ian@mydomain.com\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using `hash_item` and specifying the expected delimiters we get something like an email address in the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'bnm@blbbrfbk.pjb'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hash_item(\"ian@mydomain.com\", \"@.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You use `hash_item` in your Custom Mapping dictionary by specifying a delimiters string as the `operation`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Checking Your Obfuscation\n",
    "\n",
    "You should check that you have correctly masked all of the columns needed. \n",
    "There is a function `check_obfuscation` to do this.\n",
    "\n",
    "Use `silent=False` to print out the results.\n",
    "If you use `silent=True` (the default it will return 2 lists of `unchanged` and\n",
    "`obfuscated` columns)\n",
    "\n",
    "```\n",
    "data_obfus.check_obfuscation(\n",
    "    data: pandas.core.frame.DataFrame,\n",
    "    orig_data: pandas.core.frame.DataFrame,\n",
    "    index: int = 0,\n",
    "    silent=True,\n",
    ") -> Union[Tuple[List[str], List[str]], NoneType]\n",
    "\n",
    "Check the obfuscation results for a row.\n",
    "Parameters\n",
    "----------\n",
    "data : pd.DataFrame\n",
    "    Obfuscated DataFrame\n",
    "orig_data : pd.DataFrame\n",
    "    Original DataFrame\n",
    "index : int, optional\n",
    "    The row to check, by default 0\n",
    "silent: bool\n",
    "    If False the function returns no output and\n",
    "    returns lists of changed and unchanged columns.\n",
    "    By default, True\n",
    "\n",
    "Returns\n",
    "-------\n",
    "Optional[Tuple[List[str], List[str]]] :\n",
    "    If silent is True returns a tuple of unchanged, changed\n",
    "    items. If False, returns None.\n",
    "```\n",
    "\n",
    "> **Note** by default this will check only the first row of the data.\n",
    "> You can check other rows using the index parameter.\n",
    "\n",
    "> **Warning** The two DataFrames should have a matching index and ordering because\n",
    "> the check works by comparing the values in each column, judging that\n",
    "> column values that do not match have been obfuscated.\n",
    "\n",
    "**We first test the partially-obfuscated DataFrame from earlier.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "===== Start Check ====\n",
      "Unchanged columns:\n",
      "------------------\n",
      "AllExtIPs: 65.55.44.109\n",
      "FlowStartTime: 2019-02-12 13:00:07.000\n",
      "L4Protocol: T\n",
      "PublicIPs: ['65.55.44.109']\n",
      "TimeGenerated: 2019-02-12 14:22:40.697\n",
      "VMIPAddress: 10.0.3.5\n",
      "\n",
      "Obfuscated columns:\n",
      "--------------------\n",
      "DestIP:   nan ----> nan\n",
      "ResourceGroup:   asihuntomsworkspacerg ----> ibmkajbmepnmiaeilfofa\n",
      "SrcIP:   nan ----> nan\n",
      "TenantId:   52b1ab41-869e-4138-9e40-2a4457f09bf0 ----> f9ef3428-3ccb-4ecd-8466-dbedc7044293\n",
      "VMName:   msticalertswin1 ----> fmlmbnlpdcbnbnn\n",
      "====== End Check =====\n"
     ]
    }
   ],
   "source": [
    "partly_obfus_df = netflow_df.head(3).mp_mask.mask()\n",
    "fully_obfus_df = netflow_df.head(3).mp_mask.mask(column_map=col_map)\n",
    "\n",
    "data_obfus.check_obfuscation(partly_obfus_df, netflow_df.head(3), silent=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Checking the fully-obfuscated data set**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "===== Start Check ====\n",
      "Unchanged columns:\n",
      "------------------\n",
      "FlowStartTime: 2019-02-12 13:00:07.000\n",
      "L4Protocol: T\n",
      "TimeGenerated: 2019-02-12 14:22:40.697\n",
      "\n",
      "Obfuscated columns:\n",
      "--------------------\n",
      "AllExtIPs:   65.55.44.109 ----> 100.11.187.82\n",
      "DestIP:   nan ----> nan\n",
      "PublicIPs:   ['65.55.44.109'] ----> ['100.11.187.82']\n",
      "ResourceGroup:   asihuntomsworkspacerg ----> ibmkajbmepnmiaeilfofa\n",
      "SrcIP:   nan ----> nan\n",
      "TenantId:   52b1ab41-869e-4138-9e40-2a4457f09bf0 ----> f9ef3428-3ccb-4ecd-8466-dbedc7044293\n",
      "VMIPAddress:   10.0.3.5 ----> 10.112.51.93\n",
      "VMName:   msticalertswin1 ----> fmlmbnlpdcbnbnn\n",
      "====== End Check =====\n"
     ]
    }
   ],
   "source": [
    "data_obfus.check_obfuscation(fully_obfus_df, netflow_df.head(3), silent=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "## Appendix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import tabulate\n",
    "# print(tabulate.tabulate(netflow_df.head(3), tablefmt=\"rst\", showindex=False, headers=\"keys\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
